1 |
Multilingual Language Model Adaptive Fine-Tuning: A Study on African Languages ...
|
|
|
|
BASE
|
|
Show details
|
|
2 |
Preventing author profiling through zero-shot multilingual back-translation
|
|
|
|
In: 2021 Conference on Empirical Methods in Natural Language Processing (EMNLP) ; https://hal.inria.fr/hal-03350906 ; 2021 Conference on Empirical Methods in Natural Language Processing (EMNLP), Nov 2021, Punta Cana, Dominica (2021)
|
|
BASE
|
|
Show details
|
|
3 |
On the effect of normalization layers on Differentially Private training of deep Neural networks
|
|
|
|
In: https://hal.inria.fr/hal-03475600 ; 2021 (2021)
|
|
BASE
|
|
Show details
|
|
4 |
Adapting Language Models When Training on Privacy-Transformed Data
|
|
|
|
In: INTERSPEECH 2021 ; https://hal.inria.fr/hal-03189354 ; 2021 (2021)
|
|
Abstract:
Submitted to INTERSPEECH 2021 ; International audience ; In recent years, voice-controlled personal assistants have revolutionized the interaction with smart devices and mobile applications. These dialogue tools are then used by system providers to improve and retrain the language models (LMs). Each spoken message reveals personal information, hence, it is necessary to remove the private data from the input utterances. However, this may harm the LM training because privacy-transformed data is unlikely to match the test distribution. This paper aims to fill the gap by focusing on the adaptation of LM initially trained on privacy-transformed utterances. Our data sanitization process relies on named-entity recognition. We propose an LM adaptation strategy over the private data with minimum losses. Class-based modeling is an effective approach to overcome data sparsity in the context of n-gram model training. On the other hand, neural LMs can handle longer contexts which can yield better predictions. Our methodology combines the predictive power of class-based models and the generalization capability of neural models together. With privacy transformation, we have a relative 11% word error rate (WER) increase compared to an LM trained on the clean data. Despite the privacy-preserving, we can still achieve comparable accuracy. Empirical evaluations attain a relative WER improvement of 8% over the initial model.
|
|
Keyword:
[INFO.INFO-CL]Computer Science [cs]/Computation and Language [cs.CL]; [INFO.INFO-LG]Computer Science [cs]/Machine Learning [cs.LG]; class-based language modeling; language model adaptation; privacy-preserving learning; speech recognition
|
|
URL: https://hal.inria.fr/hal-03189354/file/Paper_1854.pdf https://hal.inria.fr/hal-03189354 https://hal.inria.fr/hal-03189354/document
|
|
BASE
|
|
Hide details
|
|
5 |
Do Acoustic Word Embeddings Capture Phonological Similarity? An Empirical Study ...
|
|
|
|
BASE
|
|
Show details
|
|
6 |
ANEA: Distant Supervision for Low-Resource Named Entity Recognition ...
|
|
|
|
BASE
|
|
Show details
|
|
8 |
Integrating Unsupervised Data Generation into Self-Supervised Neural Machine Translation for Low-Resource Languages ...
|
|
|
|
BASE
|
|
Show details
|
|
9 |
Preventing Author Profiling through Zero-Shot Multilingual Back-Translation ...
|
|
|
|
BASE
|
|
Show details
|
|
10 |
Modeling Profanity and Hate Speech in Social Media with Semantic Subspaces ...
|
|
|
|
BASE
|
|
Show details
|
|
11 |
Exploring the Potential of Lexical Paraphrases for Mitigating Noise-Induced Comprehension Errors ...
|
|
|
|
BASE
|
|
Show details
|
|
12 |
On the Correlation of Context-Aware Language Models With the Intelligibility of Polish Target Words to Czech Readers
|
|
|
|
In: Front Psychol (2021)
|
|
BASE
|
|
Show details
|
|
13 |
Transfer learning and distant supervision for multilingual Transformer models: A study on African languages
|
|
|
|
In: 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) ; https://hal.inria.fr/hal-03350901 ; 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), Nov 2020, Punta Cana, Dominica (2020)
|
|
BASE
|
|
Show details
|
|
14 |
Distant supervision and noisy label learning for low resource named entity recognition: A study on Hausa and Yorùbá
|
|
|
|
In: ICLR Workshops (AfricaNLP & PML4DC 2020) ; https://hal.archives-ouvertes.fr/hal-03359111 ; ICLR Workshops (AfricaNLP & PML4DC 2020), Apr 2020, Addis Ababa, Ethiopia (2020)
|
|
BASE
|
|
Show details
|
|
15 |
Transfer Learning and Distant Supervision for Multilingual Transformer Models: A Study on African Languages ...
|
|
|
|
BASE
|
|
Show details
|
|
16 |
Rediscovering the Slavic Continuum in Representations Emerging from Neural Models of Spoken Language Identification ...
|
|
|
|
BASE
|
|
Show details
|
|
17 |
On the Interplay Between Fine-tuning and Sentence-level Probing for Linguistic Knowledge in Pre-trained Transformers ...
|
|
|
|
BASE
|
|
Show details
|
|
18 |
A Closer Look at Linguistic Knowledge in Masked Language Models: The Case of Relative Clauses in American English ...
|
|
|
|
BASE
|
|
Show details
|
|
19 |
A Closer Look at Linguistic Knowledge in Masked Language Models: The Case of Relative Clauses in American English ...
|
|
|
|
BASE
|
|
Show details
|
|
|
|